Entrevista a Dennis Ritchie Creador
del Lenguaje "C"
Algunas
personas adquieren importancia porque cambian la historia, otras hacen la
historia. Dennis Ritchie pertenece al segundo grupo de personas. Cuando la
mayoría de nosotros estábamos todavía aprendiendo a andar, él desarrolló el
lenguaje "C", el lenguaje de programación más usado. No es necesario
acentuar la relevancia de esta contribución a la humanidad...
Pregunta: De la misma manera que muchos niños quieren ser
Superman, tú eres el ídolo de muchos programadores de C y fans de UNIX (entre
otros) de todo el mundo. ¿Cómo te sientes siendo adorado por miles de
programadores de C y UNIX?. Es completamente imposible imaginarnos a nosotros
mismos sin UNIX ni C. Cuando creaste el C y empezaste a trabajar en el UNIX, ¿Esperabas
que sería "EL FUTURO" de la informática?
Dennis: Estas dos preguntas son casi lo mismo, y además se
formulan frecuentemente. Obviamente la gratificación y apreciación que mis
colegas y yo hemos recibido son muy agradables, y sentimos que hemos ayudado a
crear algo de auténtico valor. Pero no, no esperábamos realmente que esto sería
"el futuro", ni siquiera anticipábamos la influencia eventual del
trabajo. Fue llevado a cabo bajo el espíritu de "Vamos a construir algo
útil" y al mismo tiempo hacer el trabajo que se necesitaba para ayudar a
que otros participaran. Es importante tener en cuenta que aunque el segmento
completo de Unix y C o C++ es significante, el mundo de las ciencias de
computación, tecnología y productos reales es mucho mayor. Esto es verdadero
tanto en la dirección académica del estudio de lenguajes de programación como
en el área de software donde se mueve el dinero.
Pregunta: Si el UNIX es el presente y el pasado de los
Sistemas Operativos, el C podría considerarse sin duda "EL LENGUAJE",
a pesar de todos los lenguajes Orientados a Objetos que ha aparecido en los
últimos años. ¿Cómo ves al C++ y al Java, y las guerras dialécticas que se dan
a menudo entre programadores de C y C++ ?.
Dennis: El C++ se benefició enormemente del C, porque el C
disfrutaba de una gran aceptación incluso antes del crecimiento del C++, así el
C++ pudo utilizar el C como una base sobre la que construir el nuevo lenguaje y
al mismo tiempo como una herramienta para crear sus compiladores. El C ha sido
calificado (tanto en tono admirativo como despectivo) como un lenguaje
ensamblador portable, y C++ intenta elevar su nivel a la orientación a objetos
y a una aproximación más abstracta a la programación. Los fallos de los dos (en
los estándares aparecidos últimamente) parecen ser una excesiva ornamentación y
la acumulación de parafernalia. Ambos tienen un cierto espíritu de pragmatismo,
de intentar entender lo que realmente se necesita. Java es, de forma
manifiesta, un descendiente del C++, eliminando algunas características
heredadas del C relacionadas con los punteros y, al mismo tiempo, añadiendo la
idea (no tan nueva, pero ahora quizás realmente viable) de los ficheros objeto
independientes de la máquina. Ahora que se ha visto envuelto en intrigas entre
Sun y Microsoft (y también tiene sus propios problemas con la ornamentación) es
difícil predecir hacia donde van a ir las cosas.
Pregunta: Ahora una pregunta hipotética: desde la perspectiva
de hoy y después de tantos años de experiencia con el C, ¿Hay alguna cosa que
hubieras hecho de manera diferente si tuvieras que diseñar el C desde el
principio?
Dennis: Buscaría la forma (dadas las limitaciones de
tiempo) de insistir en algo que ha estado en el estándar ANSI/ISO durante algún
tiempo: declaración completa de tipos en los argumentos de las funciones, lo
que el estándar de C de 1989 llama prototipos de funciones. Hay muchos pequeños
detalles que son más confusos de lo que debieran haber sido. Por ejemplo, el
significado de la palabra "static", que es utilizada de forma confusa
para diversos fines. Todavía estoy indeciso en cuanto a la sintaxis de las
declaraciones en el lenguaje, donde se utiliza una sintaxis que imita el uso de
las variables que son declaradas. Es una de las cosas que provoca fuertes
críticas, pero hay cierta lógica en ello.
Pregunta: Mientras que el C es un lenguaje establecido y
completamente definido, los sistemas operativos todavía están evolucionando
bastante. Surgen nuevas ideas conforme se hace más rápido y barato el hardware.
¿Cuáles son los aspectos clave en los que se basará el diseño de sistemas
operativos en el futuro?. En particular, ¿Cuál es tu opinión en cuanto a los
micro-nano-kernels frente a los diseños monolíticos?.
Dennis: No creo que este sea un aspecto importante visto
desde esa perspectiva. Prefiero por encima de todo los entornos para
aplicaciones que proporcionan un "espacio de nombres" común y
mecanismos para el acceso a los recursos de forma estructurada en la linea del
Unix (aquí incluyo a Linux), Plan 9 o Inferno. A mi me parece que la idea de
micro o nano-kernels no es realmente importante en el uso real, al menos como
base de un sistema de propósito general. En la práctica, parece ser que lo que
sucede es que el micro-kernel se especializa para el macro-sistema que trabaja
por encima suyo. Puede seguir siendo una herramienta útil para la
estructuración interna del sistema, pero no vive por si mismo. Por supuesto
(siendo el mundo complicado) hay casos en los que sistemas operativos muy
sencillos son útiles para pequeños aparatos que no están orientados a un uso
general ya sean para el escritorio o para una sala de ordenadores.
Pregunta: El UNIX es un sistema operativo con una larga
historia. Además fue creado hace muchos años y desde entonces han evolucionado
enormemente las capacidades y requerimientos de las redes, el hardware, los
servicios y las aplicaciones. ¿Cuáles son las limitaciones actuales o problemas
del UNIX de cara a las demandas del usuario en el presente y en un futuro
próximo?.
Dennis: No veo ninguna limitación tecnológica fundamental,
en términos del API básico del sistema. Hay, por supuesto, un enorme aspecto
comercial/político en términos de lucha entre los vendedores de Unix
comerciales y ahora entre los distintos suministradores de Unix "libres",
incluyendo a Linux y *BSD.
Pregunta: Se ha generado recientemente una gran preocupación
con el año 2000 que se nos acerca y el desastre potencial en Internet debido al
famoso error del Y2K (año 2000). ¿Piensas que hay algún fundamento en las
predicciones apocalípticas hechas por algunos expertos?.
Dennis: Ningún comentario inteligente sobre el tema, de
verdad. Personalmente no estaré volando a las 23:59 del 31 de Diciembre de
1999, pero porque no he estado cerca de un avión en año nuevo nunca en mi vida;
este hecho tiene, probablemente, poco que ver con el Y2K.
Pregunta: Ésta no sería una entrevista completa sin mencionár
a Inferno, el sistema operativo en el que estás trabajando actualmente. ¿Cuáles
fueron las razones principales para diseñar un sistema operativo totalmente
nuevo, junto con Limbo, su propio lenguaje de programación?. También ¿por qué
Inferno/Limbo si ya existe JavaVM/Java?, en otras palabras ¿Qué puede ofrecer
Inferno que no tenga Java?
Dennis: El proyecto de Inferno fue idea de Phil
Winterbottom y Rob Pike, y empezó justo antes de que se pusiera en marcha el
tren (máquina publicitaria) de Java. Java tuvo su propio predecesor (llamado
internamente Oak), pero en el momento en que Inferno se estaba gestando no
había todavía ninguna razón para pensar que surgiría el fenómeno y, a pesar de
que permanecíamos atentos al desarrollo de Java, era algo que todavía no había
tomado forma. Pienso que es una de esas extrañas convergencias el hecho de que
una venerable idea tecnológica (un lenguaje implementado por una máquina
virtual portable) fuera revivida tanto por Sun como por nosotros. No obstante,
la idea de Inferno era, desde su comienzo, más interesante en términos de
tecnología de Sistemas Operativos (un lenguaje y un S.O. que trabajarían con
hardware mínimo muy barato y simple y también como una aplicación bajo Windows,
Unix o Linux). Al mismo tiempo uno tiene que reconocer su mérito a Sun por
conectar mejor con el vasto y dinámico mercado de los navegadores de WWW.
Pregunta: Nos parece que el futuro de Limbo como lenguaje de
programación está ligado a la expansión y popularidad de Inferno como sistema
operativo. ¿Tendría sentido portar Limbo a otros sistemas operativos? ¿O son su
diseño y objetivos demasiado dependientes de Inferno?.
Dennis: Tecnológicamente, Limbo no es particularmente
dependiente de Inferno. En la práctica si lo es, simplemente porque un nuevo
lenguaje depende del entorno en que se utiliza.
Pregunta: Observando tu carrera en Bell Labs, parece que
hubieras trabajado en todo momento en los proyectos que realmente te gustaban,
y presumo que esto también es verdad con Inferno. ¿Me equivoco si digo que
realmente disfrutaste de tu trabajo con el diseño de UNIX y C?.
Dennis: Es cierto, he disfrutado con mi carrera en Bell
Labs (que todavía continúa).
Pregunta: No puedo evitar hacer una comparación entre ti y
toda la gente que está trabajando actualmente en proyectos sin ánimo de lucro,
tan solo porque les gusta, aunque estoy seguro de que no rechazarían dinero por
el trabajo que hacen gratis. ¿Te imaginarías a ti mismo involucrado en
proyectos como Linux o similares si no estuvieras en Bell Labs? ¿Cómo ves a
toda esta gente desde la perspectiva de un laboratorio innovativo de investigación
con muchos años de experiencia sobre tus hombros? Puesto que nuestra revista es
principalmente para usuarios de Linux no podemos olvidarnos de hacerte una
pregunta sobre Linux. Lo primero, ¿Cuál es tu opinión sobre el impulso de Linux
y la decisión de muchas compañías de empezar a desarrollar software para él
(Bell Labs, por ejemplo: Inferno tiene su propia versión para Linux)?
Dennis: Permíteme que ponga juntas estas cuestiones. Pienso
que el fenómeno Linux es muy agradable, porque se aprovecha fuertemente sobre
la base que proporcionó Unix. Linux parece estar entre los descendientes
directos de Unix con mejor salud, aunque también están los distintos sistemas
BSD así como los productos más oficiales de los fabricantes de estaciones de
trabajo y mainframes. No puedo evitar observar, por supuesto, que el mundo de
los derivados del Unix con "código fuente libre" parece estar
sufriendo de exactamente el mismo tipo de fragmentación y luchas que tuvieron y
están todavía teniendo lugar en el mundo comercial.
Pregunta: Y la Gran pregunta sobre Linux ¿Has utilizado
alguna vez Linux? Bien, si es así, ¿Qué opinión te merece?
Dennis: De momento no lo he utilizado de verdad -- en el
sentido de depender de él para mi propio trabajo del día a día --, me temo que
tengo que admitirlo. Mi propio mundo informático es una extraña mezcla de Plan
9, Windows e Inferno. Admiro mucho el crecimiento y el vigor de Linux.
Ocasionalmente, la gente me hace la misma pregunta, pero planteada de manera
que parecen esperar una respuesta que muestre celos o irritación acerca de
Linux contra Unix tal y como se nos presenta bajo la marca de las empresas
tradicionales. En absoluto; veo a ambos como la continuación de las ideas que
pusimos en marcha Ken, yo y muchos otros hace muchos años.
Pregunta: Y Microsoft... ¿Qué opinas sobre el monopolio que
tiene esta compañía sobre la informática de escritorio?. Antiguamente, las
películas de ciencia ficción retrataban un mundo dominado por
macro-computadoras que dominaban todos los aspectos de la vida diaria. La
realidad actual nos ha mostrado un cuadro distinto. Los ordenadores, en muchos
aspectos, han sido relegados a ser un simple electrodoméstico. Tú, que
desarrollaste un sistema operativo pensado para programadores, que viviste ese
ambiente de ciencia ficción y que imaginaste la situación actual de la
informática, ¿Cómo te imaginas el futuro de la informática?, ¿Qué papel piensas
que tienen en él Inferno y Linux?.
Dennis: Eso son dos cuestiones. Microsoft tiene
cierto tipo de monopolio sobre la informática de escritorio, pero esa no es la
única informática interesante en el mundo. Tanto las formas alternativas de
suministrar software (como Linux) como las partes del mundo que no salen en las
noticias tanto como Windows o las guerras de navegadores (como computación de
altas prestaciones, computación muy fiable, computación muy pequeña) tendrán
todas un espacio. Confío en que tanto Linux como Inferno van a prosperar.
Dennis
M. Ritchie dmr@bell-labs.com
Bell
Labs/Lucent Technologies
Murray Hill, NJ 07974 USA
Copyright © 1996 Lucent Technologies Inc. All rights
reserved.
The C
programming language was devised in the early 1970s as a system implementation
language for the nascent Unix operating system. Derived from the typeless
language BCPL, it evolved a type structure; created on a tiny machine as a tool
to improve a meager programming environment, it has become one of the dominant
languages of today. This paper studies its evolution.
NOTE:
*Copyright 1993 Association for Computing Machinery, Inc. This electronic
reprint made available by the author as a courtesy. For further publication
rights contact ACM or the author. This article was presented at Second History
of Programming Languages conference, Cambridge, Mass., April, 1993.
This paper is about the development of the C programming language, the influences
on it, and the conditions under which it was created. For the sake of brevity,
I omit full descriptions of C itself, its parent B [Johnson 73] and its
grandparent BCPL [Richards 79], and instead concentrate on characteristic
elements of each language and how they evolved.
C came into being in the
years 1969-1973, in parallel with the early development of the Unix operating
system; the most creative period occurred during 1972. Another spate of changes
peaked between 1977 and 1979, when portability of the Unix system was being
demonstrated. In the middle of this second period, the first widely available
description of the language appeared: The C Programming Language, often
called the `white book' or `K&R' [Kernighan 78]. Finally, in the middle
1980s, the language was officially standardized by the ANSI X3J11 committee,
which made further changes. Until the early 1980s, although compilers existed
for a variety of machine architectures and operating systems, the language was
almost exclusively associated with Unix; more recently, its use has spread much
more widely, and today it is among the languages most commonly used throughout
the computer industry.
The late 1960s were a
turbulent era for computer systems research at Bell Telephone Laboratories
[Ritchie 78] [Ritchie 84]. The company was pulling out of the Multics project
[Organick 75], which had started as a joint venture of MIT, General Electric,
and Bell Labs; by 1969, Bell Labs management, and even the researchers, came to
believe that the promises of Multics could be fulfilled only too late and too
expensively. Even before the GE-645 Multics machine was removed from the
premises, an informal group, led primarily by Ken Thompson, had begun
investigating alternatives.
Thompson wanted to create a
comfortable computing environment constructed according to his own design,
using whatever means were available. His plans, it is evident in retrospect,
incorporated many of the innovative aspects of Multics, including an explicit
notion of a process as a locus of control, a tree-structured file system, a
command interpreter as user-level program, simple representation of text files,
and generalized access to devices. They excluded others, such as unified access
to memory and to files. At the start, moreover, he and the rest of us deferred
another pioneering (though not original) element of Multics, namely writing
almost exclusively in a higher-level language. PL/I, the implementation
language of Multics, was not much to our tastes, but we were also using other
languages, including BCPL, and we regretted losing the advantages of writing
programs in a language above the level of assembler, such as ease of writing
and clarity of understanding. At the time we did not put much weight on portability;
interest in this arose later.
Thompson was faced with a
hardware environment cramped and spartan even for the time: the DEC PDP-7 on
which he started in 1968 was a machine with 8K 18-bit words of memory and no
software useful to him. While wanting to use a higher-level language, he wrote
the original Unix system in PDP-7 assembler. At the start, he did not even
program on the PDP-7 itself, but instead used a set of macros for the GEMAP
assembler on a GE-635 machine. A postprocessor generated a paper tape readable
by the PDP-7.
These tapes were carried
from the GE machine to the PDP-7 for testing until a primitive Unix kernel, an
editor, an assembler, a simple shell (command interpreter), and a few utilities
(like the Unix rm, cat, cp commands) were completed. After this point,
the operating system was self-supporting: programs could be written and tested
without resort to paper tape, and development continued on the PDP-7 itself.
Thompson's PDP-7 assembler
outdid even DEC's in simplicity; it evaluated expressions and emitted the
corresponding bits. There were no libraries, no loader or link editor: the
entire source of a program was presented to the assembler, and the output filewith
a fixed namethat emerged was directly executable. (This name, a.out,
explains a bit of Unix etymology; it is the output of the assembler. Even after
the system gained a linker and a means of specifying another name explicitly,
it was retained as the default executable result of a compilation.)
Not long after Unix first ran
on the PDP-7, in 1969, Doug McIlroy created the new system's first higher-level
language: an implementation of McClure's TMG [McClure 65]. TMG is a language
for writing compilers (more generally, TransMoGrifiers) in a top-down,
recursive-descent style that combines context-free syntax notation with
procedural elements. McIlroy and Bob Morris had used TMG to write the early
PL/I compiler for Multics.
Challenged by McIlroy's
feat in reproducing TMG, Thompson decided that Unix—possibly it had not even
been named yet—needed a system programming language. After a rapidly scuttled
attempt at Fortran, he created instead a language of his own, which he called
B. B can be thought of as C without types; more accurately, it is BCPL squeezed
into 8K bytes of memory and filtered through Thompson's brain. Its name most
probably represents a contraction of BCPL, though an alternate theory holds
that it derives from Bon [Thompson 69], an unrelated language created by
Thompson during the Multics days. Bon in turn was named either after his wife
Bonnie, or (according to an encyclopedia quotation in its manual), after a
religion whose rituals involve the murmuring of magic formulas.
BCPL was designed by Martin
Richards in the mid-1960s while he was visiting MIT, and was used during the
early 1970s for several interesting projects, among them the OS6 operating
system at Oxford [Stoy 72], and parts of the seminal Alto work at Xerox PARC
[Thacker 79]. We became familiar with it because the MIT CTSS system [Corbato
62] on which Richards worked was used for Multics development. The original
BCPL compiler was transported both to Multics and to the GE-635 GECOS system by
Rudd Canaday and others at Bell Labs [Canaday 69]; during the final throes of
Multics's life at Bell Labs and immediately after, it was the language of
choice among the group of people who would later become involved with Unix.
BCPL, B, and C all fit
firmly in the traditional procedural family typified by Fortran and Algol 60.
They are particularly oriented towards system programming, are small and
compactly described, and are amenable to translation by simple compilers. They
are `close to the machine' in that the abstractions they introduce are readily
grounded in the concrete data types and operations supplied by conventional
computers, and they rely on library routines for input-output and other
interactions with an operating system. With less success, they also use library
procedures to specify interesting control constructs such as coroutines and
procedure closures. At the same time, their abstractions lie at a sufficiently
high level that, with care, portability between machines can be achieved.
BCPL, B and C differ
syntactically in many details, but broadly they are similar. Programs consist
of a sequence of global declarations and function (procedure) declarations.
Procedures can be nested in BCPL, but may not refer to non-static objects
defined in containing procedures. B and C avoid this restriction by imposing a
more severe one: no nested procedures at all. Each of the languages (except for
earliest versions of B) recognizes separate compilation, and provides a means
for including text from named files.
Several syntactic and
lexical mechanisms of BCPL are more elegant and regular than those of B and C.
For example, BCPL's procedure and data declarations have a more uniform
structure, and it supplies a more complete set of looping constructs. Although
BCPL programs are notionally supplied from an undelimited stream of characters,
clever rules allow most semicolons to be elided after statements that end on a
line boundary. B and C omit this convenience, and end most statements with
semicolons. In spite of the differences, most of the statements and operators
of BCPL map directly into corresponding B and C.
Some of the structural
differences between BCPL and B stemmed from limitations on intermediate memory.
For example, BCPL declarations may take the form
let P1 be commandand P2 be commandand P3 be command ...
where the program
text represented by the commands contains whole procedures. The subdeclarations
connected by and occur simultaneously, so the name P3 is known
inside procedure P1. Similarly, BCPL can package a
group of declarations and statements into an expression that yields a value,
for example
E1 := valof $( declarations ; commands ; resultis E2 $) + 1
The BCPL
compiler readily handled such constructs by storing and analyzing a parsed representation
of the entire program in memory before producing output. Storage limitations on
the B compiler demanded a one-pass technique in which output was generated as
soon as possible, and the syntactic redesign that made this possible was
carried forward into C.
Certain less pleasant
aspects of BCPL owed to its own technological problems and were consciously
avoided in the design of B. For example, BCPL uses a `global vector' mechanism
for communicating between separately compiled programs. In this scheme, the
programmer explicitly associates the name of each externally visible procedure
and data object with a numeric offset in the global vector; the linkage is
accomplished in the compiled code by using these numeric offsets. B evaded this
inconvenience initially by insisting that the entire program be presented all
at once to the compiler. Later implementations of B, and all those of C, use a
conventional linker to resolve external names occurring in files compiled
separately, instead of placing the burden of assigning offsets on the
programmer.
Other fiddles in the
transition from BCPL to B were introduced as a matter of taste, and some remain
controversial, for example the decision to use the single character = for
assignment instead of :=. Similarly, B uses /**/ to
enclose comments, where BCPL uses //, to ignore text up to the end of
the line. The legacy of PL/I is evident here. (C++ has resurrected the BCPL
comment convention.) Fortran influenced the syntax of declarations: B
declarations begin with a specifier like auto or static, followed by a list of names, and C
not only followed this style but ornamented it by placing its type keywords at
the start of declarations.
Not every difference
between the BCPL language documented in Richards's book [Richards 79] and B was
deliberate; we started from an earlier version of BCPL [Richards 67]. For
example, the endcase that escapes from a BCPL switchon
statement was not present in the language when we learned it in the 1960s, and
so the overloading of the break keyword to escape from the B and C switch
statement owes to divergent evolution rather than conscious change.
In contrast to the
pervasive syntax variation that occurred during the creation of B, the core
semantic content of BCPLits type structure and expression evaluation rulesremained
intact. Both languages are typeless, or rather have a single data type, the
`word,' or `cell,' a fixed-length bit pattern. Memory in these languages
consists of a linear array of such cells, and the meaning of the contents of a
cell depends on the operation applied. The + operator, for example, simply adds
its operands using the machine's integer add instruction, and the other
arithmetic operations are equally unconscious of the actual meaning of their
operands. Because memory is a linear array, it is possible to interpret the
value in a cell as an index in this array, and BCPL supplies an operator for
this purpose. In the original language it was spelled rv, and
later !, while B uses the unary *. Thus, if p is a cell
containing the index of (or address of, or pointer to) another cell, *p refers
to the contents of the pointed-to cell, either as a value in an expression or
as the target of an assignment.
Because pointers in BCPL
and B are merely integer indices in the memory array, arithmetic on them is
meaningful: if p is the address of a cell, then p+1 is the
address of the next cell. This convention is the basis for the semantics of
arrays in both languages. When in BCPL one writes
let V = vec 10
or in B,
auto V[10];
the effect
is the same: a cell named V is allocated, then another group of
10 contiguous cells is set aside, and the memory index of the first of these is
placed into V. By a general rule, in B the expression
*(V+i)
adds V and i, and
refers to the i-th location after V. Both BCPL and B each add special
notation to sweeten such array accesses; in B an equivalent expression is
V[i]
and in BCPL
V!i
This
approach to arrays was unusual even at the time; C would later assimilate it in
an even less conventional way.
None of BCPL, B, or C
supports character data strongly in the language; each treats strings much like
vectors of integers and supplements general rules by a few conventions. In both
BCPL and B a string literal denotes the address of a static area initialized
with the characters of the string, packed into cells. In BCPL, the first packed
byte contains the number of characters in the string; in B, there is no count
and strings are terminated by a special character, which B spelled `*e'. This
change was made partially to avoid the limitation on the length of a string
caused by holding the count in an 8- or 9-bit slot, and partly because
maintaining the count seemed, in our experience, less convenient than using a
terminator.
Individual characters in a
BCPL string were usually manipulated by spreading the string out into another
array, one character per cell, and then repacking it later; B provided
corresponding routines, but people more often used other library functions that
accessed or replaced individual characters in a string.
After the TMG version of B
was working, Thompson rewrote B in itself (a bootstrapping step). During
development, he continually struggled against memory limitations: each language
addition inflated the compiler so it could barely fit, but each rewrite taking
advantage of the feature reduced its size. For example, B introduced
generalized assignment operators, using x=+y to add y to x. The notation came from Algol 68
[Wijngaarden 75] via McIlroy, who had incorporated it into his version of TMG.
(In B and early C, the operator was spelled =+ instead of += ; this
mistake, repaired in 1976, was induced by a seductively easy way of handling
the first form in B's lexical analyzer.)
Thompson went a step
further by inventing the ++ and -- operators, which increment or
decrement; their prefix or postfix position determines whether the alteration
occurs before or after noting the value of the operand. They were not in the
earliest versions of B, but appeared along the way. People often guess that
they were created to use the auto-increment and auto-decrement address modes
provided by the DEC PDP-11 on which C and Unix first became popular. This is
historically impossible, since there was no PDP-11 when B was developed. The
PDP-7, however, did have a few `auto-increment' memory cells, with the property
that an indirect memory reference through them incremented the cell. This
feature probably suggested such operators to Thompson; the generalization to
make them both prefix and postfix was his own. Indeed, the auto-increment cells
were not used directly in implementation of the operators, and a stronger
motivation for the innovation was probably his observation that the translation
of ++x was smaller than that of x=x+1.
The B compiler on the PDP-7
did not generate machine instructions, but instead `threaded code' [Bell 72],
an interpretive scheme in which the compiler's output consists of a sequence of
addresses of code fragments that perform the elementary operations. The
operations typicallyin particular for Bact on a simple stack machine.
On the PDP-7 Unix system,
only a few things were written in B except B itself, because the machine was
too small and too slow to do more than experiment; rewriting the operating system
and the utilities wholly into B was too expensive a step to seem feasible. At
some point Thompson relieved the address-space crunch by offering a `virtual B'
compiler that allowed the interpreted program to occupy more than 8K bytes by
paging the code and data within the interpreter, but it was too slow to be
practical for the common utilities. Still, some utilities written in B
appeared, including an early version of the variable-precision calculator dc
familiar to Unix users [McIlroy 79]. The most ambitious enterprise I undertook
was a genuine cross-compiler that translated B to GE-635 machine instructions,
not threaded code. It was a small tour de force: a full B compiler,
written in its own language and generating code for a 36-bit mainframe, that ran
on an 18-bit machine with 4K words of user address space. This project was
possible only because of the simplicity of the B language and its run-time
system.
Although we entertained
occasional thoughts about implementing one of the major languages of the time
like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our
resources: much simpler and smaller tools were called for. All these languages
influenced our work, but it was more fun to do things on our own.
By 1970, the Unix project
had shown enough promise that we were able to acquire the new DEC PDP-11. The
processor was among the first of its line delivered by DEC, and three months
passed before its disk arrived. Making B programs run on it using the threaded
technique required only writing the code fragments for the operators, and a
simple assembler which I coded in B; soon, dc became the first
interesting program to be tested, before any operating system, on our PDP-11.
Almost as rapidly, still waiting for the disk, Thompson recoded the Unix kernel
and some basic commands in PDP-11 assembly language. Of the 24K bytes of memory
on the machine, the earliest PDP-11 Unix system used 12K bytes for the
operating system, a tiny space for user programs, and the remainder as a RAM
disk. This version was only for testing, not for real work; the machine marked
time by enumerating closed knight's tours on chess boards of various sizes.
Once its disk appeared, we quickly migrated to it after transliterating
assembly-language commands to the PDP-11 dialect, and porting those already in
B.
By 1971, our miniature
computer center was beginning to have users. We all wanted to create
interesting software more easily. Using assembler was dreary enough that B,
despite its performance problems, had been supplemented by a small library of
useful service routines and was being used for more and more new programs.
Among the more notable results of this period was Steve Johnson's first version
of the yacc parser-generator [Johnson 79a].
The machines on which we
first used BCPL and then B were word-addressed, and these languages' single
data type, the `cell,' comfortably equated with the hardware machine word. The
advent of the PDP-11 exposed several inadequacies of B's semantic model. First,
its character-handling mechanisms, inherited with few changes from BCPL, were
clumsy: using library procedures to spread packed strings into individual cells
and then repack, or to access and replace individual characters, began to feel
awkward, even silly, on a byte-oriented machine.
Second, although the
original PDP-11 did not provide for floating-point arithmetic, the manufacturer
promised that it would soon be available. Floating-point operations had been
added to BCPL in our Multics and GCOS compilers by defining special operators,
but the mechanism was possible only because on the relevant machines, a single
word was large enough to contain a floating-point number; this was not true on
the 16-bit PDP-11.
Finally, the B and BCPL
model implied overhead in dealing with pointers: the language rules, by
defining a pointer as an index in an array of words, forced pointers to be
represented as word indices. Each pointer reference generated a run-time scale
conversion from the pointer to the byte address expected by the hardware.
For all these reasons, it
seemed that a typing scheme was necessary to cope with characters and byte
addressing, and to prepare for the coming floating-point hardware. Other
issues, particularly type safety and interface checking, did not seem as
important then as they became later.
Aside from the problems
with the language itself, the B compiler's threaded-code technique yielded
programs so much slower than their assembly-language counterparts that we
discounted the possibility of recoding the operating system or its central
utilities in B.
In 1971 I began to extend
the B language by adding a character type and also rewrote its compiler to
generate PDP-11 machine instructions instead of threaded code. Thus the
transition from B to C was contemporaneous with the creation of a compiler
capable of producing programs fast and small enough to compete with assembly
language. I called the slightly-extended language NB, for `new B.'
NB existed so briefly that
no full description of it was written. It supplied the types int and char,
arrays of them, and pointers to them, declared in a style typified by
int i, j;char c, d;
int iarray[10];int ipointer[];char carray[10];char cpointer[];
The
semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray
create cells dynamically initialized with a value pointing to the first of a
sequence of 10 integers and characters respectively. The declarations for ipointer
and cpointer omit the size, to assert that no storage should be allocated
automatically. Within procedures, the language's interpretation of the pointers
was identical to that of the array variables: a pointer declaration created a
cell differing from an array declaration only in that the programmer was
expected to assign a referent, instead of letting the compiler allocate the
space and initialize the cell.
Values stored in the cells
bound to array and pointer names were the machine addresses, measured in bytes,
of the corresponding storage area. Therefore, indirection through a pointer
implied no run-time overhead to scale the pointer from word to byte offset. On
the other hand, the machine code for array subscripting and pointer arithmetic
now depended on the type of the array or the pointer: to compute iarray[i]
or ipointer+i implied scaling the addend i by the
size of the object referred to.
These semantics represented
an easy transition from B, and I experimented with them for some months.
Problems became evident when I tried to extend the type notation, especially to
add structured (record) types. Structures, it seemed, should map in an
intuitive way onto memory in the machine, but in a structure containing an
array, there was no good place to stash the pointer containing the base of the
array, nor any convenient way to arrange that it be initialized. For example,
the directory entries of early Unix systems might be described in C as
struct { int inumber; char name[14];};
I wanted the
structure not merely to characterize an abstract object but also to describe a
collection of bits that might be read from a directory. Where could the
compiler hide the pointer to name that the semantics demanded? Even
if structures were thought of more abstractly, and the space for pointers could
be hidden somehow, how could I handle the technical problem of properly
initializing these pointers when allocating a complicated object, perhaps one
that specified structures containing arrays containing structures to arbitrary
depth?
The solution constituted
the crucial jump in the evolutionary chain between typeless BCPL and typed C.
It eliminated the materialization of the pointer in storage, and instead caused
the creation of the pointer when the array name is mentioned in an expression.
The rule, which survives in today's C, is that values of array type are
converted, when they appear in expressions, into pointers to the first of the
objects making up the array.
This invention enabled most
existing B code to continue to work, despite the underlying shift in the
language's semantics. The few programs that assigned new values to an array
name to adjust its originpossible in B and BCPL, meaningless in Cwere easily
repaired. More important, the new language retained a coherent and workable (if
unusual) explanation of the semantics of arrays, while opening the way to a
more comprehensive type structure.
The second innovation that
most clearly distinguishes C from its predecessors is this fuller type structure
and especially its expression in the syntax of declarations. NB offered the
basic types int and char, together with arrays of them, and
pointers to them, but no further ways of composition. Generalization was
required: given an object of any type, it should be possible to describe a new
object that gathers several into an array, yields it from a function, or is a
pointer to it.
For each object of such a
composed type, there was already a way to mention the underlying object: index
the array, call the function, use the indirection operator on the pointer.
Analogical reasoning led to a declaration syntax for names mirroring that of
the expression syntax in which the names typically appear. Thus,
int i, *pi, **ppi;
declare an
integer, a pointer to an integer, a pointer to a pointer to an integer. The
syntax of these declarations reflects the observation that i, *pi, and **ppi all
yield an int type when used in an expression. Similarly,
int f(), *f(), (*f)();
declare a
function returning an integer, a function returning a pointer to an integer, a
pointer to a function returning an integer;
int *api[10], (*pai)[10];
declare an
array of pointers to integers, and a pointer to an array of integers. In all
these cases the declaration of a variable resembles its usage in an expression
whose type is the one named at the head of the declaration.
The scheme of type
composition adopted by C owes considerable debt to Algol 68, although it did
not, perhaps, emerge in a form that Algol's adherents would approve of. The
central notion I captured from Algol was a type structure based on atomic types
(including structures), composed into arrays, pointers (references), and
functions (procedures). Algol 68's concept of unions and casts also had an
influence that appeared later.
After creating the type
system, the associated syntax, and the compiler for the new language, I felt
that it deserved a new name; NB seemed insufficiently distinctive. I decided to
follow the single-letter style and called it C, leaving open the question
whether the name represented a progression through the alphabet or through the
letters in BCPL.
Rapid changes continued
after the language had been named, for example the introduction of the &&
and || operators. In BCPL and B, the evaluation of expressions depends on
context: within if and other conditional statements that compare
an expression's value with zero, these languages place a special interpretation
on the and (&) and or (|) operators. In ordinary contexts,
they operate bitwise, but in the B statement
if (e1 & e2) ...
the
compiler must evaluate e1 and if it is non-zero, evaluate e2, and if
it too is non-zero, elaborate the statement dependent on the if. The
requirement descends recursively on & and | operators within e1 and e2. The
short-circuit semantics of the Boolean operators in such `truth-value' context
seemed desirable, but the overloading of the operators was difficult to explain
and use. At the suggestion of Alan Snyder, I introduced the &&
and || operators to make the mechanism more explicit.
Their tardy introduction
explains an infelicity of C's precedence rules. In B one writes
if (a==b & c) ...
to check
whether a equals b and c is non-zero; in such a conditional
expression it is better that & have lower precedence than ==. In
converting from B to C, one wants to replace & by && in such a statement; to make the
conversion less painful, we decided to keep the precedence of the &
operator the same relative to ==, and merely split the precedence of
&& slightly from &. Today, it seems that it would have
been preferable to move the relative precedences of & and ==, and
thereby simplify a common C idiom: to test a masked value against another
value, one must write
if ((a&mask) == b) ...
where the inner
parentheses are required but easily forgotten.
Many other changes occurred
around 1972-3, but the most important was the introduction of the preprocessor,
partly at the urging of Alan Snyder [Snyder 74], but also in recognition of the
utility of the the file-inclusion mechanisms available in BCPL and PL/I. Its
original version was exceedingly simple, and provided only included files and
simple string replacements: #include and #define of parameterless macros. Soon
thereafter, it was extended, mostly by Mike Lesk and then by John Reiser, to
incorporate macros with arguments and conditional compilation. The preprocessor
was originally considered an optional adjunct to the language itself. Indeed,
for some years, it was not even invoked unless the source program contained a
special signal at its beginning. This attitude persisted, and explains both the
incomplete integration of the syntax of the preprocessor with the rest of the
language and the imprecision of its description in early reference manuals.
By early 1973, the
essentials of modern C were complete. The language and compiler were strong
enough to permit us to rewrite the Unix kernel for the PDP-11 in C during the
summer of that year. (Thompson had made a brief attempt to produce a system
coded in an early version of Cbefore structuresin 1972, but gave up the
effort.) Also during this period, the compiler was retargeted to other nearby
machines, particularly the Honeywell 635 and IBM 360/370; because the language
could not live in isolation, the prototypes for the modern libraries were
developed. In particular, Lesk wrote a `portable I/O package' [Lesk 72] that
was later reworked to become the C `standard I/O' routines. In 1978 Brian
Kernighan and I published The C Programming Language [Kernighan 78].
Although it did not describe some additions that soon became common, this book
served as the language reference until a formal standard was adopted more than
ten years later. Although we worked closely together on this book, there was a clear
division of labor: Kernighan wrote almost all the expository material, while I
was responsible for the appendix containing the reference manual and the
chapter on interfacing with the Unix system.
During 1973-1980, the
language grew a bit: the type structure gained unsigned, long, union, and
enumeration types, and structures became nearly first-class objects (lacking
only a notation for literals). Equally important developments appeared in its
environment and the accompanying technology. Writing the Unix kernel in C had
given us enough confidence in the language's usefulness and efficiency that we
began to recode the system's utilities and tools as well, and then to move the
most interesting among them to the other platforms. As described in [Johnson 78a],
we discovered that the hardest problems in propagating Unix tools lay not in
the interaction of the C language with new hardware, but in adapting to the
existing software of other operating systems. Thus Steve Johnson began to work
on pcc, a C compiler intended to be easy to retarget to new machines
[Johnson 78b], while he, Thompson, and I began to move the Unix system itself
to the Interdata 8/32 computer.
The language changes during
this period, especially around 1977, were largely focused on considerations of
portability and type safety, in an effort to cope with the problems we foresaw
and observed in moving a considerable body of code to the new Interdata
platform. C at that time still manifested strong signs of its typeless origins.
Pointers, for example, were barely distinguished from integral memory indices
in early language manuals or extant code; the similarity of the arithmetic
properties of character pointers and unsigned integers made it hard to resist
the temptation to identify them. The unsigned types were added to make unsigned
arithmetic available without confusing it with pointer manipulation. Similarly,
the early language condoned assignments between integers and pointers, but this
practice began to be discouraged; a notation for type conversions (called
`casts' from the example of Algol 68) was invented to specify type conversions
more explicitly. Beguiled by the example of PL/I, early C did not tie structure
pointers firmly to the structures they pointed to, and permitted programmers to
write pointer->member almost without regard to the type of pointer;
such an expression was taken uncritically as a reference to a region of memory
designated by the pointer, while the member name specified only an offset and a
type.
Although the first edition
of K&R described most of the rules that brought C's type structure to its
present form, many programs written in the older, more relaxed style persisted,
and so did compilers that tolerated it. To encourage people to pay more attention
to the official language rules, to detect legal but suspicious constructions,
and to help find interface mismatches undetectable with simple mechanisms for
separate compilation, Steve Johnson adapted his pcc compiler to produce lint
[Johnson 79b], which scanned a set of files and remarked on dubious
constructions.
The success of our
portability experiment on the Interdata 8/32 soon led to another by Tom London
and John Reiser on the DEC VAX 11/780. This machine became much more popular
than the Interdata, and Unix and the C language began to spread rapidly, both
within AT&T and outside. Although by the middle 1970s Unix was in use by a
variety of projects within the Bell System as well as a small group of
research-oriented industrial, academic, and government organizations outside
our company, its real growth began only after portability had been achieved. Of
particular note were the System III and System V versions of the system from
the emerging Computer Systems division of AT&T, based on work by the
company's development and research groups, and the BSD series of releases by
the University of California at Berkeley that derived from research
organizations in Bell Laboratories.
During the 1980s the use of
the C language spread widely, and compilers became available on nearly every
machine architecture and operating system; in particular it became popular as a
programming tool for personal computers, both for manufacturers of commercial
software for these machines, and for end-users interesting in programming. At
the start of the decade, nearly every compiler was based on Johnson's pcc;
by 1985 there were many independently-produced compiler products.
By 1982 it was clear that C
needed formal standardization. The best approximation to a standard, the first
edition of K&R, no longer described the language in actual use; in
particular, it mentioned neither the void or enum types. While it foreshadowed the
newer approach to structures, only after it was published did the language
support assigning them, passing them to and from functions, and associating the
names of members firmly with the structure or union containing them. Although
compilers distributed by AT&T incorporated these changes, and most of the
purveyors of compilers not based on pcc quickly picked up them up, there
remained no complete, authoritative description of the language.
The first edition of
K&R was also insufficiently precise on many details of the language, and it
became increasingly impractical to regard pcc as a `reference compiler;'
it did not perfectly embody even the language described by K&R, let alone
subsequent extensions. Finally, the incipient use of C in projects subject to
commercial and government contract meant that the imprimatur of an official
standard was important. Thus (at the urging of M. D. McIlroy), ANSI established
the X3J11 committee under the direction of CBEMA in the summer of 1983, with
the goal of producing a C standard. X3J11 produced its report [ANSI 89] at the
end of 1989, and subsequently this standard was accepted by ISO as ISO/IEC
9899-1990.
From the beginning, the
X3J11 committee took a cautious, conservative view of language extensions. Much
to my satisfaction, they took seriously their goal: `to develop a clear, consistent,
and unambiguous Standard for the C programming language which codifies the
common, existing definition of C and which promotes the portability of user
programs across C language environments.' [ANSI 89] The committee realized that
mere promulgation of a standard does not make the world change.
X3J11 introduced only one
genuinely important change to the language itself: it incorporated the types of
formal arguments in the type signature of a function, using syntax borrowed
from C++ [Stroustrup 86]. In the old style, external functions were declared
like this:
double sin();
which says
only that sin is a function returning a double (that is, double-precision
floating-point) value. In the new style, this better rendered
double sin(double);
to make the
argument type explicit and thus encourage better type checking and appropriate
conversion. Even this addition, though it produced a noticeably better
language, caused difficulties. The committee justifiably felt that simply
outlawing `old-style' function definitions and declarations was not feasible,
yet also agreed that the new forms were better. The inevitable compromise was
as good as it could have been, though the language definition is complicated by
permitting both forms, and writers of portable software must contend with
compilers not yet brought up to standard.
X3J11 also introduced a
host of smaller additions and adjustments, for example, the type qualifiers const and volatile,
and slightly different type promotion rules. Nevertheless, the standardization
process did not change the character of the language. In particular, the C
standard did not attempt to specify formally the language semantics, and so
there can be dispute over fine points; nevertheless, it successfully accounted
for changes in usage since the original description, and is sufficiently
precise to base implementations on it.
Thus the core C language
escaped nearly unscathed from the standardization process, and the Standard
emerged more as a better, careful codification than a new invention. More
important changes took place in the language's surroundings: the preprocessor
and the library. The preprocessor performs macro substitution, using
conventions distinct from the rest of the language. Its interaction with the
compiler had never been well-described, and X3J11 attempted to remedy the
situation. The result is noticeably better than the explanation in the first
edition of K&R; besides being more comprehensive, it provides operations,
like token concatenation, previously available only by accidents of
implementation.
X3J11 correctly believed
that a full and careful description of a standard C library was as important as
its work on the language itself. The C language itself does not provide for
input-output or any other interaction with the outside world, and thus depends
on a set of standard procedures. At the time of publication of K&R, C was
thought of mainly as the system programming language of Unix; although we
provided examples of library routines intended to be readily transportable to
other operating systems, underlying support from Unix was implicitly
understood. Thus, the X3J11 committee spent much of its time designing and
documenting a set of library routines required to be available in all
conforming implementations.
By the rules of the
standards process, the current activity of the X3J11 committee is confined to
issuing interpretations on the existing standard. However, an informal group
originally convened by Rex Jaeschke as NCEG (Numerical C Extensions Group) has
been officially accepted as subgroup X3J11.1, and they continue to consider
extensions to C. As the name implies, many of these possible extensions are
intended to make the language more suitable for numerical use: for example,
multi-dimensional arrays whose bounds are dynamically determined, incorporation
of facilities for dealing with IEEE arithmetic, and making the language more
effective on machines with vector or other advanced architectural features. Not
all the possible extensions are specifically numerical; they include a notation
for structure literals.
C and even B have several
direct descendants, though they do not rival Pascal in generating progeny. One
side branch developed early. When Steve Johnson visited the University of Waterloo
on sabbatical in 1972, he brought B with him. It became popular on the
Honeywell machines there, and later spawned Eh and Zed (the Canadian answers to
`what follows B?'). When Johnson returned to Bell Labs in 1973, he was
disconcerted to find that the language whose seeds he brought to Canada had
evolved back home; even his own yacc program had been rewritten in C, by
Alan Snyder.
More recent descendants of
C proper include Concurrent C [Gehani 89], Objective C [Cox 86], C* [Thinking
90], and especially C++ [Stroustrup 86]. The language is also widely used as an
intermediate representation (essentially, as a portable assembly language) for
a wide variety of compilers, both for direct descendents like C++, and
independent languages like Modula 3 [Nelson 91] and Eiffel [Meyer 88].
Two ideas are most
characteristic of C among languages of its class: the relationship between
arrays and pointers, and the way in which declaration syntax mimics expression
syntax. They are also among its most frequently criticized features, and often
serve as stumbling blocks to the beginner. In both cases, historical accidents
or mistakes have exacerbated their difficulty. The most important of these has
been the tolerance of C compilers to errors in type. As should be clear from
the history above, C evolved from typeless languages. It did not suddenly
appear to its earliest users and developers as an entirely new language with
its own rules; instead we continually had to adapt existing programs as the
language developed, and make allowance for an existing body of code. (Later,
the ANSI X3J11 committee standardizing C would face the same problem.)
Compilers in 1977, and even
well after, did not complain about usages such as assigning between integers
and pointers or using objects of the wrong type to refer to structure members.
Although the language definition presented in the first edition of K&R was
reasonably (though not completely) coherent in its treatment of type rules,
that book admitted that existing compilers didn't enforce them. Moreover, some
rules designed to ease early transitions contributed to later confusion. For
example, the empty square brackets in the function declaration
int f(a) int a[]; { ... }
are a
living fossil, a remnant of NB's way of declaring a pointer; a is, in
this special case only, interpreted in C as a pointer. The notation survived in
part for the sake of compatibility, in part under the rationalization that it
would allow programmers to communicate to their readers an intent to pass f a pointer
generated from an array, rather than a reference to a single integer.
Unfortunately, it serves as much to confuse the learner as to alert the reader.
In K&R C, supplying
arguments of the proper type to a function call was the responsibility of the
programmer, and the extant compilers did not check for type agreement. The
failure of the original language to include argument types in the type
signature of a function was a significant weakness, indeed the one that
required the X3J11 committee's boldest and most painful innovation to repair.
The early design is explained (if not justified) by my avoidance of
technological problems, especially cross-checking between separately-compiled
source files, and my incomplete assimilation of the implications of moving
between an untyped to a typed language. The lint program, mentioned
above, tried to alleviate the problem: among its other functions, lint
checks the consistency and coherency of a whole program by scanning a set of
source files, comparing the types of function arguments used in calls with
those in their definitions.
An accident of syntax
contributed to the perceived complexity of the language. The indirection
operator, spelled * in C, is syntactically a unary
prefix operator, just as in BCPL and B. This works well in simple expressions,
but in more complex cases, parentheses are required to direct the parsing. For
example, to distinguish indirection through the value returned by a function
from calling a function designated by a pointer, one writes *fp() and (*pf)()
respectively. The style used in expressions carries through to declarations, so
the names might be declared
int *fp();int (*pf)();
In more
ornate but still realistic cases, things become worse:
int *(*pfp)();
is a
pointer to a function returning a pointer to an integer. There are two effects
occurring. Most important, C has a relatively rich set of ways of describing
types (compared, say, with Pascal). Declarations in languages as expressive as
CAlgol 68, for exampledescribe objects equally hard to understand, simply
because the objects themselves are complex. A second effect owes to details of
the syntax. Declarations in C must be read in an `inside-out' style that many
find difficult to grasp [Anderson 80]. Sethi [Sethi 81] observed that many of
the nested declarations and expressions would become simpler if the indirection
operator had been taken as a postfix operator instead of prefix, but by then it
was too late to change.
In spite of its
difficulties, I believe that the C's approach to declarations remains
plausible, and am comfortable with it; it is a useful unifying principle.
The other characteristic
feature of C, its treatment of arrays, is more suspect on practical grounds,
though it also has real virtues. Although the relationship between pointers and
arrays is unusual, it can be learned. Moreover, the language shows considerable
power to describe important concepts, for example, vectors whose length varies
at run time, with only a few basic rules and conventions. In particular,
character strings are handled by the same mechanisms as any other array, plus
the convention that a null character terminates a string. It is interesting to
compare C's approach with that of two nearly contemporaneous languages, Algol
68 and Pascal [Jensen 74]. Arrays in Algol 68 either have fixed bounds, or are
`flexible:' considerable mechanism is required both in the language definition,
and in compilers, to accommodate flexible arrays (and not all compilers fully
implement them.) Original Pascal had only fixed-sized arrays and strings, and
this proved confining [Kernighan 81]. Later, this was partially fixed, though
the resulting language is not yet universally available.
C treats strings as arrays
of characters conventionally terminated by a marker. Aside from one special
rule about initialization by string literals, the semantics of strings are
fully subsumed by more general rules governing all arrays, and as a result the
language is simpler to describe and to translate than one incorporating the
string as a unique data type. Some costs accrue from its approach: certain
string operations are more expensive than in other designs because application
code or a library routine must occasionally search for the end of a string,
because few built-in operations are available, and because the burden of
storage management for strings falls more heavily on the user. Nevertheless,
C's approach to strings works well.
On the other hand, C's
treatment of arrays in general (not just strings) has unfortunate implications
both for optimization and for future extensions. The prevalence of pointers in
C programs, whether those declared explicitly or arising from arrays, means
that optimizers must be cautious, and must use careful dataflow techniques to
achieve good results. Sophisticated compilers can understand what most pointers
can possibly change, but some important usages remain difficult to analyze. For
example, functions with pointer arguments derived from arrays are hard to
compile into efficient code on vector machines, because it is seldom possible
to determine that one argument pointer does not overlap data also referred to
by another argument, or accessible externally. More fundamentally, the
definition of C so specifically describes the semantics of arrays that changes
or extensions treating arrays as more primitive objects, and permitting
operations on them as wholes, become hard to fit into the existing language.
Even extensions to permit the declaration and use of multidimensional arrays
whose size is determined dynamically are not entirely straightforward
[MacDonald 89] [Ritchie 90], although they would make it much easier to write
numerical libraries in C. Thus, C covers the most important uses of strings and
arrays arising in practice by a uniform and simple mechanism, but leaves
problems for highly efficient implementations and for extensions.
Many smaller infelicities
exist in the language and its description besides those discussed above, of
course. There are also general criticisms to be lodged that transcend detailed
points. Chief among these is that the language and its generally-expected
environment provide little help for writing very large systems. The naming
structure provides only two main levels, `external' (visible everywhere) and `internal'
(within a single procedure). An intermediate level of visibility (within a
single file of data and procedures) is weakly tied to the language definition.
Thus, there is little direct support for modularization, and project designers
are forced to create their own conventions.
Similarly, C itself
provides two durations of storage: `automatic' objects that exist while control
resides in or below a procedure, and `static,' existing throughout execution of
a program. Off-stack, dynamically-allocated storage is provided only by a
library routine and the burden of managing it is placed on the programmer: C is
hostile to automatic garbage collection.
C has become successful to
an extent far surpassing any early expectations. What qualities contributed to
its widespread use?
Doubtless the success of
Unix itself was the most important factor; it made the language available to
hundreds of thousands of people. Conversely, of course, Unix's use of C and its
consequent portability to a wide variety of machines was important in the
system's success. But the language's invasion of other environments suggests
more fundamental merits.
Despite some aspects
mysterious to the beginner and occasionally even to the adept, C remains a simple
and small language, translatable with simple and small compilers. Its types and
operations are well-grounded in those provided by real machines, and for people
used to how computers work, learning the idioms for generating time- and
space-efficient programs is not difficult. At the same time the language is
sufficiently abstracted from machine details that program portability can be
achieved.
Equally important, C and
its central library support always remained in touch with a real environment.
It was not designed in isolation to prove a point, or to serve as an example,
but as a tool to write programs that did useful things; it was always meant to
interact with a larger operating system, and was regarded as a tool to build
larger tools. A parsimonious, pragmatic approach influenced the things that
went into C: it covers the essential needs of many programmers, but does not
try to supply too much.
Finally, despite the
changes that it has undergone since its first published description, which was
admittedly informal and incomplete, the actual C language as seen by millions
of users using many different compilers has remained remarkably stable and
unified compared to those of similarly widespread currency, for example Pascal
and Fortran. There are differing dialects of Cmost noticeably, those described
by the older K&R and the newer Standard Cbut on the whole, C has remained
freer of proprietary extensions than other languages. Perhaps the most
significant extensions are the `far' and `near' pointer qualifications intended
to deal with peculiarities of some Intel processors. Although C was not
originally designed with portability as a prime goal, it succeeded in
expressing programs, even including operating systems, on machines ranging from
the smallest personal computers through the mightiest supercomputers.
C is quirky, flawed, and an
enormous success. While accidents of history surely helped, it evidently
satisfied a need for a system implementation language efficient enough to
displace assembly language, yet sufficiently abstract and fluent to describe
algorithms and interactions in a wide variety of environments.
It is worth summarizing
compactly the roles of the direct contributors to today's C language. Ken
Thompson created the B language in 1969-70; it was derived directly from Martin
Richards's BCPL. Dennis Ritchie turned B into C during 1971-73, keeping most of
B's syntax while adding types and many other changes, and writing the first
compiler. Ritchie, Alan Snyder, Steven C. Johnson, Michael Lesk, and Thompson
contributed language ideas during 1972-1977, and Johnson's portable compiler
remains widely used. During this period, the collection of library routines
grew considerably, thanks to these people and many others at Bell Laboratories.
In 1978, Brian Kernighan and Ritchie wrote the book that became the language
definition for several years. Beginning in 1983, the ANSI X3J11 committee
standardized the language. Especially notable in keeping its efforts on track
were its officers Jim Brodie, Tom Plum, and P. J. Plauger, and the successive
draft redactors, Larry Rosler and Dave Prosser.
I thank Brian Kernighan,
Doug McIlroy, Dave Prosser, Peter Nelson, Rob Pike, Ken Thompson, and HOPL's
referees for advice in the preparation of this paper.
[ANSI 89]
American National Standards Institute, American
National Standard for Information Systems­Programming Language C,
X3.159-1989.
[Anderson
80]
B. Anderson, `Type syntax in the language C: an
object lesson in syntactic innovation,' SIGPLAN Notices 15 (3), March,
1980, pp. 21-27.
[Bell 72]
J. R. Bell, `Threaded Code,' C. ACM 16
(6), pp. 370-372.
[Canaday
69]
R. H. Canaday and D. M. Ritchie, `Bell
Laboratories BCPL,' AT&T Bell Laboratories internal memorandum, May, 1969.
[Corbato
62]
F. J. Corbato, M. Merwin-Dagget, R. C. Daley,
`An Experimental Time-sharing System,' AFIPS Conf. Proc. SJCC, 1962, pp.
335-344.
[Cox 86]
B. J. Cox and A. J. Novobilski, Object-Oriented
Programming: An Evolutionary Approach, Addison-Wesley: Reading, Mass.,
1986. Second edition, 1991.
[Gehani 89]
N. H. Gehani and W. D. Roome, Concurrent C,
Silicon Press: Summit, NJ, 1989.
[Jensen 74]
K. Jensen and N. Wirth, Pascal User Manual
and Report, Springer-Verlag: New York, Heidelberg, Berlin. Second Edition,
1974.
[Johnson
73]
S. C. Johnson and B. W. Kernighan, `The
Programming Language B,' Comp. Sci. Tech. Report #8, AT&T Bell Laboratories
(January 1973).
[Johnson
78a]
S. C. Johnson and D. M. Ritchie, `Portability
of C Programs and the UNIX System,' Bell Sys. Tech. J. 57 (6) (part 2),
July-Aug, 1978.
[Johnson
78b]
S. C. Johnson, `A Portable Compiler: Theory and
Practice,' Proc. 5th ACM POPL Symposium (January 1978).
[Johnson
79a]
S. C. Johnson, `Yet another compiler-compiler,'
in Unix Programmer's Manual, Seventh Edition, Vol. 2A, M. D. McIlroy and
B. W. Kernighan, eds. AT&T Bell Laboratories: Murray Hill, NJ, 1979.
[Johnson
79b]
S. C. Johnson, `Lint, a Program Checker,' in Unix
Programmer's Manual, Seventh Edition, Vol. 2B, M. D. McIlroy and B. W.
Kernighan, eds. AT&T Bell Laboratories: Murray Hill, NJ, 1979.
[Kernighan
78]
B. W. Kernighan and D. M. Ritchie, The C
Programming Language, Prentice-Hall: Englewood Cliffs, NJ, 1978. Second
edition, 1988.
[Kernighan
81]
B. W. Kernighan, `Why Pascal is not my favorite
programming language,' Comp. Sci. Tech. Rep. #100, AT&T Bell Laboratories,
1981.
[Lesk 73]
M. E. Lesk, `A Portable I/O Package,' AT&T
Bell Laboratories internal memorandum ca. 1973.
[MacDonald
89]
T. MacDonald, `Arrays of variable length,' J. C
Lang. Trans 1 (3), Dec. 1989, pp. 215-233.
[McClure
65]
R. M. McClure, `TMGA Syntax Directed
Compiler,' Proc. 20th ACM National Conf. (1965), pp. 262-274.
[McIlroy
60]
M. D. McIlroy, `Macro Instruction Extensions of
Compiler Languages,' C. ACM 3 (4), pp. 214-220.
[McIlroy
79]
M. D. McIlroy and B. W. Kernighan, eds, Unix
Programmer's Manual, Seventh Edition, Vol. I, AT&T Bell Laboratories:
Murray Hill, NJ, 1979.
[Meyer 88]
B. Meyer, Object-oriented Software
Construction, Prentice-Hall: Englewood Cliffs, NJ, 1988.
[Nelson 91]
G. Nelson, Systems Programming with
Modula-3, Prentice-Hall: Englewood Cliffs, NJ, 1991.
[Organick
75]
E. I. Organick, The Multics System: An
Examination of its Structure, MIT Press: Cambridge, Mass., 1975.
[Richards
67]
M. Richards, `The BCPL Reference Manual,' MIT
Project MAC Memorandum M-352, July 1967.
[Richards
79]
M. Richards and C. Whitbey-Strevens, BCPL:
The Language and its Compiler, Cambridge Univ. Press: Cambridge, 1979.
[Ritchie 78]
D. M. Ritchie, `UNIX: A Retrospective,' Bell
Sys. Tech. J. 57 (6) (part 2), July-Aug, 1978.
[Ritchie
84]
D. M. Ritchie, `The Evolution of the UNIX
Time-sharing System,' AT&T Bell Labs. Tech. J. 63 (8) (part 2), Oct.
1984.
[Ritchie
90]
D. M. Ritchie, `Variable-size arrays in C,' J.
C Lang. Trans. 2 (2), Sept. 1990, pp. 81-86.
[Sethi 81]
R. Sethi, `Uniform syntax for type expressions
and declarators,' Softw. Prac. and Exp. 11 (6), June 1981, pp. 623-628.
[Snyder 74]
A. Snyder, A Portable Compiler for the
Language C, MIT: Cambridge, Mass., 1974.
[Stoy 72]
J. E. Stoy and C. Strachey, `OS6An
experimental operating system for a small computer. Part I: General principles
and structure,' Comp J. 15, (Aug. 1972), pp. 117-124.
[Stroustrup
86]
B. Stroustrup, The C++ Programming Language,
Addison-Wesley: Reading, Mass., 1986. Second edition, 1991.
[Thacker
79]
C. P. Thacker, E. M. McCreight, B. W. Lampson,
R. F. Sproull, D. R. Boggs, `Alto: A Personal Computer,' in Computer Structures:
Principles and Examples, D. Sieworek, C. G. Bell, A. Newell, McGraw-Hill:
New York, 1982.
[Thinking
90]
C* Programming Guide, Thinking Machines Corp.: Cambridge
Mass., 1990.
[Thompson
69]
K. Thompson, `Bonan Interactive Language,'
undated AT&T Bell Laboratories internal memorandum (ca. 1969).
[Wijngaarden
75]
A. van Wijngaarden, B. J. Mailloux, J. E. Peck,
C. H. Koster, M. Sintzoff, C. Lindsey, L. G. Meertens, R. G. Fisker, `Revised
report on the algorithmic language Algol 68,' Acta Informatica 5, pp.
1-236.